Goto

Collaborating Authors

 whole-slide image


A bag of tricks for real-time Mitotic Figure detection

Marzahl, Christian, Napora, Brian

arXiv.org Artificial Intelligence

Mitotic figure (MF) detection in histopathology images is challenging due to large variations in slide scanners, staining protocols, tissue types, and the presence of artifacts. This paper presents a collection of training techniques - a bag of tricks - that enable robust, real-time MF detection across diverse domains. We build on the efficient RTMDet single stage object detector to achieve high inference speed suitable for clinical deployment. Our method addresses scanner variability and tumor heterogeneity via extensive multi-domain training data, balanced sampling, and careful augmentation. Additionally, we employ targeted, hard negative mining on necrotic and debris tissue to reduce false positives. In a grouped 5-fold cross-validation across multiple MF datasets, our model achieves an F1 score between 0.78 and 0.84. On the preliminary test set of the MItosis DOmain Generalization (MIDOG) 2025 challenge, our single-stage RTMDet-S based approach reaches an F1 of 0.81, outperforming larger models and demonstrating adaptability to new, unfamiliar domains. The proposed solution offers a practical trade-off between accuracy and speed, making it attractive for real-world clinical adoption.


From Pixels to Histopathology: A Graph-Based Framework for Interpretable Whole Slide Image Analysis

Weers, Alexander, Berger, Alexander H., Lux, Laurin, Schüffler, Peter, Rueckert, Daniel, Paetzold, Johannes C.

arXiv.org Artificial Intelligence

The histopathological classification of whole-slide images (WSIs) is a fundamental task in digital pathology; yet it requires extensive time and expertise from specialists. While deep learning methods show promising results, they typically process WSIs by dividing them into artificial patches, which inherently prevents a network from learning from the entire image context, disregards natural tissue structures and compromises interpretability. Our method overcomes this limitation through a novel graph-based framework that constructs WSI graph representations. The WSI-graph efficiently captures essential histopathological information in a compact form. We build tissue representations (nodes) that follow biological boundaries rather than arbitrary patches all while providing interpretable features for explainability. Through adaptive graph coarsening guided by learned embeddings, we progressively merge regions while maintaining discriminative local features and enabling efficient global information exchange. In our method's final step, we solve the diagnostic task through a graph attention network. We empirically demonstrate strong performance on multiple challenging tasks such as cancer stage classification and survival prediction, while also identifying predictive factors using Integrated Gradients. Our implementation is publicly available at https://github.com/HistoGraph31/pix2pathology


SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding

Chen, Ying, Wang, Guoan, Ji, Yuanfeng, Li, Yanjun, Ye, Jin, Li, Tianbin, Zhang, Bin, Pei, Nana, Yu, Rongshan, Qiao, Yu, He, Junjun

arXiv.org Artificial Intelligence

Despite the progress made by multimodal large language models (MLLMs) in computational pathology, they remain limited by a predominant focus on patch-level analysis, missing essential contextual information at the whole-slide level. The lack of large-scale instruction datasets and the gigapixel scale of whole slide images (WSIs) pose significant developmental challenges. In this paper, we present SlideChat, the first vision-language assistant capable of understanding gigapixel whole-slide images, exhibiting excellent multimodal conversational capability and response complex instruction across diverse pathology scenarios. To support its development, we created SlideInstruction, the largest instruction-following dataset for WSIs consisting of 4.2K WSI captions and 176K VQA pairs with multiple categories. Furthermore, we propose SlideBench, a multimodal benchmark that incorporates captioning and VQA tasks to assess SlideChat's capabilities in varied clinical settings such as microscopy, diagnosis. Compared to both general and specialized MLLMs, SlideChat exhibits exceptional capabilities achieving state-of-the-art performance on 18 of 22 tasks. For example, it achieved an overall accuracy of 81.17% on SlideBench-VQA (TCGA), and 54.15% on SlideBench-VQA (BCNB). We will fully release SlideChat, SlideInstruction and SlideBench as open-source resources to facilitate research and development in computational pathology.


Benchmarking Hierarchical Image Pyramid Transformer for the classification of colon biopsies and polyps in histopathology images

Contreras, Nohemi Sofia Leon, D'Amato, Marina, Ciompi, Francesco, Grisi, Clement, Aswolinskiy, Witali, Vatrano, Simona, Fraggetta, Filippo, Nagtegaal, Iris

arXiv.org Artificial Intelligence

Training neural networks with high-quality pixel-level annotation in histopathology whole-slide images (WSI) is an expensive process due to gigapixel resolution of WSIs. However, recent advances in self-supervised learning have shown that highly descriptive image representations can be learned without the need for annotations. We investigate the application of the recent Hierarchical Image Pyramid Transformer (HIPT) model for the specific task of classification of colorectal biopsies and polyps. After evaluating the effectiveness of TCGA-learned features in the original HIPT model, we incorporate colon biopsy image information into HIPT's pretraining using two distinct strategies: (1) fine-tuning HIPT from the existing TCGA weights and (2) pretraining HIPT from random weight initialization. We compare the performance of these pretraining regimes on two colorectal biopsy classification tasks: binary and multiclass classification.


Hierarchical Vision Transformers for Context-Aware Prostate Cancer Grading in Whole Slide Images

Grisi, Clément, Litjens, Geert, van der Laak, Jeroen

arXiv.org Artificial Intelligence

Vision Transformers (ViTs) have ushered in a new era in computer vision, showcasing unparalleled performance in many challenging tasks. However, their practical deployment in computational pathology has largely been constrained by the sheer size of whole slide images (WSIs), which result in lengthy input sequences. Transformers faced a similar limitation when applied to long documents, and Hierarchical Transformers were introduced to circumvent it. Given the analogous challenge with WSIs and their inherent hierarchical structure, Hierarchical Vision Transformers (H-ViTs) emerge as a promising solution in computational pathology. This work delves into the capabilities of H-ViTs, evaluating their efficiency for prostate cancer grading in WSIs. Our results show that they achieve competitive performance against existing state-of-the-art solutions.


Publicly available datasets of breast histopathology H&E whole-slide images: A scoping review

Tafavvoghi, Masoud, Bongo, Lars Ailo, Shvetsov, Nikita, Busund, Lill-Tove Rasmussen, Møllersen, Kajsa

arXiv.org Artificial Intelligence

Advancements in digital pathology and computing resources have made a significant impact in the field of computational pathology for breast cancer diagnosis and treatment. However, access to high-quality labeled histopathological images of breast cancer is a big challenge that limits the development of accurate and robust deep learning models. In this scoping review, we identified the publicly available datasets of breast H&E stained whole-slide images (WSI) that can be used to develop deep learning algorithms. We systematically searched nine scientific literature databases and nine research data repositories and found 17 publicly available datasets containing 10385 H&E WSIs of breast cancer. Moreover, we reported image metadata and characteristics for each dataset to assist researchers in selecting proper datasets for specific tasks in breast cancer computational pathology. In addition, we compiled two lists of breast H&E patches and private datasets as supplementary resources for researchers. Notably, only 28% of the included articles utilized multiple datasets, and only 14% used an external validation set, suggesting that the performance of other developed models may be susceptible to overestimation. The TCGA-BRCA was used in 52% of the selected studies. This dataset has a considerable selection bias that can impact the robustness and generalizability of the trained algorithms. There is also a lack of consistent metadata reporting of breast WSI datasets that can be an issue in developing accurate deep learning models, indicating the necessity of establishing explicit guidelines for documenting breast WSI dataset characteristics and metadata.


Comments on 'Fast and scalable search of whole-slide images via self-supervised deep learning'

Sikaroudi, Milad, Afshari, Mehdi, Shafique, Abubakr, Kalra, Shivam, Tizhoosh, H. R.

arXiv.org Artificial Intelligence

Chen et al. [Chen2022] recently published the article "Fast and scalable search of whole-slide images via self-supervised deep learning" in Nature Biomedical Engineering. The authors call their method "self-supervised image search for histology", short SISH. The paper is not easily readable, and many important details are buried under ambiguous descriptions. Incremental modification of Yottixel - Yottixel introduced the concept of "mosaic" through a customized clustering and selection process [Kalra2020a]. While Chen et al. frequently mention "Yottixel" and "mosaic," they only acknowledge once that they have followed the Yottixel's mosaic generation process.


Cascaded Cross-Attention Networks for Data-Efficient Whole-Slide Image Classification Using Transformers

Khader, Firas, Kather, Jakob Nikolas, Han, Tianyu, Nebelung, Sven, Kuhl, Christiane, Stegmaier, Johannes, Truhn, Daniel

arXiv.org Artificial Intelligence

Whole-Slide Imaging allows for the capturing and digitization of high-resolution images of histological specimen. An automated analysis of such images using deep learning models is therefore of high demand. The transformer architecture has been proposed as a possible candidate for effectively leveraging the high-resolution information. Here, the whole-slide image is partitioned into smaller image patches and feature tokens are extracted from these image patches. However, while the conventional transformer allows for a simultaneous processing of a large set of input tokens, the computational demand scales quadratically with the number of input tokens and thus quadratically with the number of image patches. To address this problem we propose a novel cascaded cross-attention network (CCAN) based on the cross-attention mechanism that scales linearly with the number of extracted patches. Our experiments demonstrate that this architecture is at least on-par with and even outperforms other attention-based state-of-the-art methods on two public datasets: On the use-case of lung cancer (TCGA NSCLC) our model reaches a mean area under the receiver operating characteristic (AUC) of 0.970 $\pm$ 0.008 and on renal cancer (TCGA RCC) reaches a mean AUC of 0.985 $\pm$ 0.004. Furthermore, we show that our proposed model is efficient in low-data regimes, making it a promising approach for analyzing whole-slide images in resource-limited settings. To foster research in this direction, we make our code publicly available on GitHub: XXX.


NephroNet: A Novel Program for Identifying Renal Cell Carcinoma and Generating Synthetic Training Images with Convolutional Neural Networks and Diffusion Models

Sabharwal, Yashvir

arXiv.org Artificial Intelligence

Renal cell carcinoma (RCC) is a type of cancer that originates in the kidneys and is the most common type of kidney cancer in adults. It can be classified into several subtypes, including clear cell RCC, papillary RCC, and chromophobe RCC. In this study, an artificial intelligence model was developed and trained for classifying different subtypes of RCC using ResNet-18, a convolutional neural network that has been widely used for image classification tasks. The model was trained on a dataset of RCC histopathology images, which consisted of digital images of RCC surgical resection slides that were annotated with the corresponding subtype labels. The performance of the trained model was evaluated using several metrics, including accuracy, precision, and recall. Additionally, in this research, a novel synthetic image generation tool, NephroNet, is developed on diffusion models that are used to generate original images of RCC surgical resection slides. Diffusion models are a class of generative models capable of synthesizing high-quality images from noise. Several diffusers such as Stable Diffusion, Dreambooth Text-to-Image, and Textual Inversion were trained on a dataset of RCC images and were used to generate a series of original images that resembled RCC surgical resection slides, all within the span of fewer than four seconds. The generated images were visually realistic and could be used for creating new training datasets, testing the performance of image analysis algorithms, and training medical professionals. NephroNet is provided as an open-source software package and contains files for data preprocessing, training, and visualization. Overall, this study demonstrates the potential of artificial intelligence and diffusion models for classifying and generating RCC images, respectively. These methods could be useful for improving the diagnosis and treatment of RCC and more.


Contextual learning is nearly all you need

#artificialintelligence

In another article, Sunghoon Kwon and colleagues show (Y. Lee et al. As explained by Faisal Mahmood and co-authors in an associated News & Views (G. In another research article in this issue, Mahmood and colleagues show another application of self-supervised learning: searching and retrieving gigapixel whole-slide images (Figure 1) at speeds that are independent of the size of the repository (C. To search for a tissue patch, rather than querying against every slide in the dataset, a variational autoencoder (a probabilistic generative model that learns latent representations of the data) is trained to represent select patches from each slide as a set of codes in a manner that the patches with the highest chances of matching the query can be retrieved by leveraging uncertainty-based ranking and a tree data structure for speed efficiency and scalability.